Chapter 3 — Diffusion Modeling with Pupil-Linked Arousal (Response-Signal Design)
Abstract
This chapter presents a hierarchical Wiener diffusion decision model (DDM) for a response-signal change-detection task in older adults. The primary model maps task difficulty to drift rate (v) and boundary separation (a), with starting-point bias (z) varying by task and effort but constant across difficulty levels (reflecting the randomized trial design), and small condition effects on non-decision time (t₀). We report comprehensive quality assurance checks, manipulation checks independent of the DDM, model comparison via LOO cross-validation, and extensive posterior predictive checks with emphasis on subject-wise mid-body RT quantiles.
1 Introduction
1.2 Insights from Drift-Diffusion Modeling of Aging
To disentangle these component processes of decision-making, researchers increasingly turn to computational modeling approaches. One powerful framework is the Drift Diffusion Model (DDM), which provides a quantitative decomposition of choice reaction time data into psychologically interpretable parameters (Ratcliff, 1978; Ratcliff & McKoon, 2008; Voss et al., 2004). In a forced-choice decision task, the DDM conceptualizes the process as a gradual, noisy accumulation of evidence toward one of two decision boundaries (representing the response options). The key model parameters include:
- Drift rate (\(v\)) – the average speed of evidence accumulation toward the correct decision. This reflects the quality or efficiency of information processing; a higher drift rate means the decision maker can extract and use task-relevant information more quickly.
- Decision boundary (\(a\)) – the amount of evidence required to commit to a choice, often interpreted as response caution or threshold. A larger boundary separation indicates a more cautious strategy (waiting for more evidence before deciding), whereas a smaller boundary implies a more impulsive or speed-emphasizing strategy.
- Starting point (\(z\)) – the initial bias or predisposition toward one option or the other before evidence is accumulated. If the starting point is centered (typically 0.5 in relative units), there is no pre-existing bias; deviations from center indicate an a priori bias to favor a particular response (e.g., a bias to say “yes” vs “no” in a detection task).
- Non-decision time (\(t_0\)) – the duration of processes outside of the evidence accumulation itself, such as perceptual encoding of the stimulus and the motor execution of the response (Voss et al., 2004). Non-decision time accounts for aspects of the reaction time that are not decision-related.
Formally, the evidence accumulation process follows the stochastic differential equation (Ratcliff, 1978; Ratcliff & McKoon, 2008):
\[dX(t) = v \, dt + dW(t) \tag{1}\]
where \(X(t)\) is the accumulated evidence at time \(t\), \(v\) is the drift rate (evidence strength), and \(dW(t)\) is a Wiener noise process with unit variance. The process starts at \(X(0) = z \cdot a\), where \(z \in [0,1]\) is the starting-point bias (expressed as a proportion of boundary separation) and \(a > 0\) is the boundary separation (Ratcliff & McKoon, 2008). The decision process terminates when \(X(t)\) reaches either the upper boundary (\(X(t) = a\)) or the lower boundary (\(X(t) = 0\)). The total reaction time (RT) is the sum of the decision time and the non-decision time (\(t_0\)):
\[\text{RT} = t_{\text{decision}} + t_0 \tag{2}\]
By fitting the DDM to participants’ accuracy and response time distributions, one can infer how these latent parameters differ between groups (e.g. young vs. old) or conditions. This model-based approach has proved especially illuminating in aging research. Rather than relying on overall slowing measures alone, the DDM allows researchers to pinpoint which aspects of processing slow down or change with age and which remain intact.
The DDM mathematically formalizes the behavioral patterns described above. The model confirms that older adults’ slower responses are largely due to shifts in caution and peripheral processing, rather than uniformly impaired evidence accumulation. Specifically, older adults consistently exhibit significantly higher decision thresholds (\(a\)) than younger adults across a variety of tasks (Ratcliff et al., 2004; Ratcliff & McKoon, 2008), quantitatively capturing the strategic slowing: by widening the distance between boundaries, older adults counteract internal noise and maintain high accuracy. Additionally, the model isolates the contribution of peripheral slowing; older adults typically show increased non-decision times (\(t_0\)), often 80–100 ms longer than young adults (Ratcliff et al., 2004), reflecting age-related delays in motor execution and sensory encoding. Crucially, the DDM reveals that drift rate (\(v\))—the core measure of cognitive processing efficiency—is often remarkably preserved in aging for simple perceptual tasks (Ratcliff et al., 2001, 2003, 2004).
However, as noted previously, this preservation is not universal: in tasks taxing memory retrieval (Spaniol et al., 2006) or complex visual search (Madden & Allen, 1991), modeling confirms a decline in drift rates, indicating that for complex cognitive operations, the older brain does accumulate evidence more slowly. Furthermore, while older adults can adjust their boundaries, they often exhibit a rigidity in this setting; Starns and Ratcliff (2010) demonstrated that older adults fail to lower their boundaries (\(a\)) optimally under speed pressure, prioritizing accuracy even when the task incentivizes speed. Regarding bias (\(z\)), healthy aging is generally not associated with systematic shifts in starting point for simple tasks, though specific biases can emerge in memory paradigms (e.g., a conservative bias against “new” items to avoid false alarms) (Ratcliff et al., 2004; Spaniol et al., 2006). Overall, the application of drift-diffusion modeling provides a nuanced portrait of aging: it mathematically separates the strategic adaptations (increased \(a\)) and peripheral slowing (increased \(t_0\)) from the fundamental cognitive capacity (drift rate \(v\)), which remains intact in simple contexts but declines under high cognitive load.
Having established the baseline DDM profile of older adults, we turn to an important modulating factor: arousal and effort. In this chapter, we leverage the diffusion model to investigate how fluctuations in arousal (induced via physical effort) can alter these latent decision processes in older adults.
1.3 Arousal, Effort, and Decision Performance in Older Adults
Beyond baseline aging effects, cognitive performance is strongly influenced by the organism’s arousal state—the level of alertness or activation of physiological and neural systems. Classic theory, dating back to the Yerkes–Dodson law (Yerkes & Dodson, 1908), holds that the relationship between arousal and performance follows an inverted-U function: increasing arousal enhances performance up to an optimal point, after which further arousal (especially if reaching stress or anxiety levels) impairs performance. In the context of aging, this dynamic takes on special significance. Adaptive Gain Theory (AGT) (Aston-Jones & Cohen, 2005) provides a neural mechanism for this relationship, linking phasic and tonic Locus Coeruleus activity to optimal task performance. When this framework is extended to aging, researchers posit that the arousal–performance curve is altered, often manifesting as a leftward shift or compression of the inverted-U function (Mather & Harley, 2016; Mikneviciute et al., 2022). This implies that older adults may reach their “optimal” arousal peak at lower levels of objective demand than younger adults. Consequently, levels of effort or stress that might be engaging or beneficial for a younger adult (placing them at the peak of the curve) can push an older adult onto the “descending limb,” leading to supra-optimal arousal and performance decrements (Huang & Clewett, 2024; Mather & Harley, 2016).
Older adults typically have a reduced physiological capacity to sustain high arousal yet often need to exert greater mental effort to perform a given task at the same level as a younger person. Recent studies support the idea that effortful engagement is more taxing for older adults in measurable ways. For example, Hess and Ennis (Hess & Ennis, 2012) demonstrated that when older adults performed continuous arithmetic tasks (e.g., subtraction), they exhibited significantly larger increases in systolic blood pressure (SBP)—a physiological index of effortful arousal—than young adults, and this elevated physiological cost predicted greater fatigue on subsequent tasks. Furthermore, research directly relevant to the current paradigm has shown that concurrent physical effort can be detrimental to cognition in aging. Azer et al. (Azer et al., 2023) found that while maintaining a concurrent moderate isometric handgrip (30% MVC), older adults showed significantly reduced accuracy in a visual working memory task with distractors, whereas younger adults remained unaffected. This supports the limited-capacity framework, suggesting that shared processing resources (Wickens, 2008) are more easily depleted in older adults, or that the combined demand drives arousal into a dysregulated state (Verhaeghen et al., 2003). What are the expected effects of arousal fluctuations on the decision-making mechanisms of older adults? By applying the drift-diffusion model (DDM), we can make specific hypotheses about how effort-induced arousal will modulate latent decision parameters. The present study manipulates arousal via physical effort (5% vs. 40% MVC), providing a controlled way to “push” older participants along the arousal curve. Behavioral theories such as Resource Competition and Limited Capacity predict that excessive effort will siphon processing resources away from the decision process, degrading evidence quality, whereas Adaptive Gain Theory predicts that older adults will more quickly slide down the descending limb of the inverted-U curve when demands are high. These frameworks collectively motivate the computational predictions outlined after we review the relevant physiological mechanisms.
1.4 Arousal Dynamics and the Locus Coeruleus-Norepinephrine System
Beyond the structural decision parameters captured by the DDM, cognitive performance is dynamically modulated by the brain’s arousal state. A central regulator of this arousal is the Locus Coeruleus-Norepinephrine (LC-NE) system, a small brainstem nucleus that serves as the primary source of norepinephrine to the forebrain (Aston-Jones & Cohen, 2005). The LC-NE system modulates the “neural gain” of cortical circuits—essentially the signal-to-noise ratio of information processing. According to Adaptive Gain Theory (AGT), optimal performance relies on a balance between two modes of LC activity: a moderate tonic (baseline) firing rate that promotes focused attention, and robust phasic (event-related) bursts that facilitate rapid behavioral responses to task-relevant stimuli (Aston-Jones & Cohen, 2005; Gilzenrat et al., 2010).
In the context of aging, this system undergoes significant changes. While structural degradation of the LC is common in older adults (Mather & Harley, 2016), functional compensatory mechanisms often emerge. Older adults may exhibit chronically elevated tonic arousal or hyper-responsivity to challenge, potentially as a strategy to offset neural inefficiency (Lee et al., 2018; Mather et al., 2016). However, this compensation has limits; Adaptive Gain Theory suggests that the relationship between arousal and performance follows an inverted-U function (Yerkes & Dodson, 1908), which in older adults may be shifted or compressed (Mather & Harley, 2016). Consequently, levels of physical or cognitive effort that would optimize arousal in younger adults might push older adults into a supra-optimal state (the “descending limb” of the curve), where excessive norepinephrine release leads to distractibility, indiscriminate processing, and performance decrements (Aston-Jones & Cohen, 2005; Eldar et al., 2013).
Pupillometry provides a powerful, non-invasive window into these LC-NE dynamics. Because pupil diameter tracks LC firing activity with high temporal precision (Joshi et al., 2016; Murphy et al., 2014), it serves as a proxy for both tonic and phasic arousal states.
Baseline Pupil Diameter: Reflects tonic LC activity and general alertness levels (Gilzenrat et al., 2010).
Task-Evoked Pupil Response (TEPR): Reflects phasic LC activation and the mobilization of mental effort during task execution (Beatty, 1982). Specifically, the amplitude and latency of the TEPR—often quantified as the Area Under the Curve (AUC)—have been linked to the subjective difficulty of a task and the cognitive resources recruited to perform it (Kahneman & Beatty, 1966; Van Gerven et al., 2004).
Crucially, recent computational work has begun to bridge the gap between these physiological arousal signals and the latent cognitive processes of decision-making. In a seminal study, Cavanagh et al. (2014) demonstrated that eye tracking and pupillometry serve as indicators of dissociable latent decision processes. By applying hierarchical Bayesian DDM to a probabilistic learning task, they found that while gaze dwell time predicted the rate of evidence accumulation (drift rate), pupil dilation specifically predicted an increase in the decision threshold (\(a\)) during high-conflict choices. This finding fundamentally reframed the role of phasic arousal in decision-making: rather than merely “energizing” the system generally, the pupil-linked arousal response can act as a specific signal for cognitive control, prompting a “hold your horses” mechanism (Frank, 2006) that raises the decision boundary to prevent impulsive errors. This link between pupil dilation and threshold adjustment (\(a\)) has since been corroborated by others, suggesting it may be a general marker of decision uncertainty and conflict monitoring (Urai et al., 2017; Wel & Steenbergen, 2018).
This connection is vital for the present study. By integrating pupillometry with DDM, we can move beyond simple behavioral outcomes to ask mechanistic questions about how physical effort impacts the aging brain. Does the physical arousal from a high-effort handgrip act as a beneficial boost that sharpens neural gain (increasing drift rate, \(v\)), as Adaptive Gain Theory might predict for moderate arousal? Or, does it trigger a conflict signal that prompts older adults to become more conservative (increasing threshold, \(a\)), as suggested by the work of Cavanagh et al. (2014)? Alternatively, if the effort pushes older adults into a supra-optimal state, does the pupil signal reflect internal noise that degrades evidence quality (decreasing \(v\))? This combined pupil-DDM approach allows us to directly test these competing hypotheses by linking observable physiological states to the latent computational components of the decision process.
1.4.1 Hypotheses and Predictions
Grounded in the behavioral and physiological frameworks above, we test four preregistered predictions:
- Drift rate (\(v\)) will decrease under high effort (40% MVC) relative to low effort, reflecting degraded evidence accumulation from resource competition or supra-optimal arousal.
- Boundary separation (\(a\)) will increase under high effort, reflecting the conflict-control signal suggested by phasic pupil-linked arousal and older adults’ strategic caution.
- Non-decision time (\(t_0\)) may increase modestly under high effort, reflecting cognitive-motor interference during concurrent grip maintenance.
- Starting bias (\(z\)) may move toward 0.5 if high-effort trials evoke strong phasic LC-NE responses that “reset” pre-existing response tendencies (Gee et al., 2020), with the magnitude of any shift moderated by LC integrity (Huang & Clewett, 2024).
1.5 Conclusion and Overview of the Present Study
In this chapter, we investigate how effort-induced arousal modulates decision-making in older adults at a computational level. By applying hierarchical Bayesian drift-diffusion modeling (HDDM) to behavioral data from older participants under low-effort vs. high-effort conditions, we test whether heightened arousal degrades evidence accumulation, triggers compensatory increases in boundary separation, alters bias, or slows non-decision processes. By decomposing older adults’ performance with the DDM, we can pinpoint the locus of the arousal effect: is it degrading the evidence itself, shifting the strategic criterion, resetting biases, or altering peripheral processing speeds? This approach allows us to move beyond simple outcome measures (mean reaction time or accuracy) to visualize the mechanism of how a physiologically loaded aging brain arrives at a choice. The broader significance of this work lies in understanding whether the characteristic cautiousness and processing inefficiencies of older decision-makers are fixed traits or dynamic features modulated by physiological state. If arousal can “tune” decision parameters in predictable ways, it suggests that decision performance in aging is not static but state-dependent. If excessive effort proves detrimental as hypothesized, it highlights the critical importance of effort regulation and stress management for older individuals in demanding environments. Ultimately, by mathematically decomposing these effects, we seek to clarify whether boosting arousal in older adults helps “overclock” their decision processes or instead exacerbates underlying capacity limits. This knowledge contributes to a more comprehensive theory of cognitive aging—one that accounts for both the baseline architectural changes of the brain and the dynamic, moment-to-moment influence of internal physiological states.
2 Methods
2.1 DDM Implementation
2.1.1 Parameter Transformations (Link Functions)
To ensure parameter constraints, we apply link functions (Bürkner, 2017; Ratcliff & Tuerlinckx, 2002). The model was implemented using the brms package with family = wiener(link_bs="log", link_ndt="log", link_bias="logit"):
- Drift rate: \(v = \beta_v\) (identity link)
- Boundary separation: \(a = \exp(\beta_{\text{bs}})\) (log link, ensures \(a > 0\))
- Non-decision time: \(t_0 = \exp(\beta_{\text{ndt}})\) (log link, ensures \(t_0 > 0\))
- Starting-point bias: \(z = \text{logit}^{-1}(\beta_{\text{bias}}) = \frac{\exp(\beta_{\text{bias}})}{1 + \exp(\beta_{\text{bias}})}\) (logit link, ensures \(z \in [0,1]\))
2.1.2 Hierarchical Structure
For subject \(i\) and trial \(j\), the model parameters are (Bürkner, 2017; Ratcliff & McKoon, 2008):
\[v_{ij} = \beta_{v,0} + \sum_k \beta_{v,k} X_{k,ij} + u_{v,i} \tag{3}\]
\[a_{ij} = \exp\left(\beta_{\text{bs},0} + \sum_k \beta_{\text{bs},k} X_{k,ij} + u_{\text{bs},i}\right) \tag{4}\]
\[t_{0,ij} = \exp\left(\beta_{\text{ndt},0} + \sum_k \beta_{\text{ndt},k} X_{k,ij}\right) \tag{5}\]
\[z_{ij} = \text{logit}^{-1}\left(\beta_{\text{bias},0} + \sum_k \beta_{\text{bias},k} X_{k,ij} + u_{\text{bias},i}\right) \tag{6}\]
where \(\beta_{0}\) are population-level intercepts, \(\beta_k\) are population-level coefficients for predictors \(X_k\) (e.g., task, effort condition, difficulty level), and \(u_i \sim \mathcal{N}(0, \sigma^2_u)\) are subject-level random effects. Note that \(t_0\) is modeled without subject-level random effects to maintain model stability in the response-signal design.
2.1.3 Likelihood Function
The likelihood for a single trial with RT \(t\) and decision \(d \in \{\text{"same"}, \text{"different"}\}\) follows the Wiener first-passage time distribution (Feller, 1968):
\[p(t, d | v, a, t_0, z) = \text{Wiener}(t - t_0 | v, a, z) \tag{7}\]
where the Wiener distribution gives the probability density of the first-passage time to boundary \(d\) given drift \(v\), boundary separation \(a\), and starting point \(z \cdot a\).
2.2 Decision Coding
We employed response-side coding (also called “stimulus coding” or “response coding”) where the upper boundary corresponds to “different” responses and the lower boundary corresponds to “same” responses (see Figure 1), rather than accuracy-based coding where boundaries represent correct/incorrect responses. This specification is necessary to disentangle response bias (a preference for one response alternative regardless of stimulus truth) from discriminability (drift rate) (Ratcliff & McKoon, 2008; Wiecki et al., 2013).
In accuracy-based coding, bias would imply a preference for being correct (which is conceptually trivial), whereas response-side coding allows us to model the meaningful preference for the “same” response option observed in detection tasks. This is particularly important for same/different discrimination tasks where participants often exhibit specific response biases (e.g., a conservative “same” bias) rather than general accuracy biases. Previous work linking arousal to decision-making has demonstrated that phasic arousal reduces response biases in detection tasks (Gee et al., 2020), and capturing this effect requires mapping boundaries to response alternatives.
On Standard (Δ=0) trials, participants chose “same” on 89.1% of trials and “different” on 10.9%—consistent with a conservative response tendency. The inclusion of Standard trials provides a critical constraint for estimating bias. While Standard trials theoretically have zero objective evidence difference (Δ=0), our model estimated a strong negative drift rate (v ≈ -1.26 in the primary model), indicating that participants actively accumulate evidence toward “same” responses when stimuli are identical. The observed preference for “same” responses reflects the combined effects of both drift and starting-point bias, with drift dominating the decision process. This bias estimate would be unobtainable using accuracy-coded models, where Standard trials would be ambiguous (both “same” and “different” responses are technically correct when stimuli are identical). Response-side coding was implemented directly from the raw data using the resp_is_diff column, which explicitly records whether each trial was a “different” response (TRUE) or “same” response (FALSE), ensuring accurate mapping to DDM boundary assignments.
2.3 Computational Methods
All analyses were performed using R version 4.5.2 (2025-10-31) “[Not] Part in a Rumble” (R Core Team, 2025) on macOS (aarch64-apple-darwin20, Apple Silicon). Bayesian hierarchical models were fitted using brms (Bürkner, 2017, 2018) with CmdStan (Stan Development Team, 2024) via cmdstanr (Gabry & Češnovar, 2021) as the backend. Model comparison was conducted using leave-one-out cross-validation via the loo package (Vehtari et al., 2017). Data manipulation and visualization used dplyr (Wickham, François, et al., 2023), tidyr (Wickham, Vaughan, et al., 2023), readr (Wickham et al., 2024), and ggplot2 (Wickham, 2016). Tables were generated using gt (Iannone et al., 2024). Posterior analysis and diagnostics used the posterior package (Bürkner et al., 2022). Code development and debugging were performed using Cursor (AI-assisted code editor), and the document was rendered using Quarto (Allaire et al., 2022).
MCMC Sampling Specifications:
- Algorithm: NUTS (No-U-Turn Sampler)
- Chains: 4
- Iterations: 8,000 per chain (4,000 warmup, 4,000 sampling)
- Convergence criteria: \(\hat{R}\) ≤ 1.01 (Gelman & Rubin, 1992; Vehtari et al., 2021), minimum bulk/tail ESS ≥ 400
2.4 Sample & Experimental Design
2.4.1 Participants
67 older adults (≥65 years; mean age = 71.3 years, SD = 4.8). This analysis uses the same dataset and participants as described in the LC behavioral report manuscript (see References). All participants provided informed consent and received course credit or financial compensation for participation. Study procedures were approved by the Institutional Review Board of the University of California, Riverside and all experimental procedures were performed in accordance with the approved guidelines and regulations.
Note: 12 participants performed at or below chance (≤55%) in some conditions but were retained to maximize sample size, as hierarchical modeling borrows strength to stabilize their estimates. Sensitivity analyses confirmed their inclusion did not alter main effects.
2.4.2 Tasks and Conditions
Tasks: Auditory Detection Task (ADT) and Visual Detection Task (VDT) were modeled jointly with ‘task’ as a fixed effect. This approach uses a single random effect variance parameter for subject-level variability across both tasks, allowing the model to share information between tasks and stabilize subject-specific estimates through hierarchical shrinkage while estimating task-specific offsets. [Detailed task descriptions, stimulus parameters, and equipment specifications are provided in the LC behavioral report manuscript; see References.]
Conditions (within-subjects, factorial design):
- Difficulty: Standard (Δ=0), Easy, Hard
- Effort: Low (5% MVC), High (40% MVC)
Total design cells: 2 tasks × 3 difficulty levels × 2 effort conditions = 12 cells per subject.
Total trials analyzed: 17,834 (after exclusions). Standard trials: 3,597 (20.2%).
2.5 Trial Timeline (Response-Signal Design)
RT definition: Time from response-screen onset (response-signal design). This is a critical methodological detail: RTs are measured from when the response screen appears (after the stimulus presentation period), not from stimulus onset. This design constrains the interpretation of t₀ (non-decision time) to primarily reflect motor execution and response selection rather than the sum of encoding + motor time as in traditional RT tasks. The response-signal design rationale is described in detail in the LC behavioral report manuscript.
Filtering: The DDM analysis applies a 250 ms lower-bound cutoff for anticipatory responses. While a 150–200 ms cutoff is standard for young adult populations (Whelan, 2008), research consistently demonstrates that older adults exhibit significantly longer non-decision times (\(T_{er}\)), reflecting age-related slowing in stimulus encoding and motor execution. Specifically, drift diffusion modeling in aging populations estimates that \(T_{er}\) is approximately 80–100 ms longer in older adults compared to their younger counterparts (Ratcliff et al., 2001, 2004). Consequently, a 250 ms threshold provides a conservative lower bound that adjusts for this physiological shift, ensuring that excluded trials represent genuine non-decisional reflexes rather than the leading edge of the valid decision distribution (Woods et al., 2015). During preprocessing, trials with RT < 200 ms were excluded (see Trial Exclusions section below). No additional trials were excluded at the 250 ms threshold as all remaining trials had RT ≥ 250 ms. The upper bound of 3.000 s reflects the maximum response window in the task design; no upper-bound filtering was applied post-experiment.
2.5.1 Data Quality Assurance
2.5.1.1 Trial Exclusions
Trial exclusions were applied during data preprocessing. The following table summarizes exclusions by filter type:
| Trial Exclusions Summary | ||||
|---|---|---|---|---|
| Filter Applied | Trials Remaining | Trials Removed | % Remaining | % Removed |
| Starting trials | 19,740 | 0 | 100.00 | 0.00 |
| RT < 200 ms1 | 1 19,495 | 1 245 | 1 98.76 | 1 1.24 |
| Missed responses | 19,194 | 301 | 97.23 | 2.77 |
| Invalid run performance | 16,958 | 2,236 | 85.91 | 14.09 |
| Final trials (Preprocessing) | 16,958 | 2,782 | 85.91 | 14.09 |
| Restored (Audit) | 17,243 | −285 | 87.35 | 12.65 |
| Final Analysis N | 17,243 | 0 | 87.35 | 12.65 |
| 1 RT < 200 ms: Anticipatory responses excluded during preprocessing. The DDM analysis used a 250 ms cutoff, but no additional trials were excluded. 285 trials were restored after a decision coding audit confirmed their validity. | ||||
Summary: Of 19,740 starting trials, 2,782 trials (14.1%) were excluded during preprocessing: - 245 trials (1.2%) excluded for RT < 200 ms (anticipatory responses) - 301 trials (1.5%) excluded for missed responses - 2,236 trials (11.3%) excluded for invalid run performance
Final dataset after preprocessing: 16,958 trials (85.9% retention) from 65 subjects. Additional data processing steps (e.g., decision coding verification, quality checks) resulted in the final analysis dataset of 17,834 trials from 67 subjects. Two additional subjects were included after verification of their data quality during the decision coding audit. The dataset was updated to use the latest raw behavioral data file (bap_beh_trialdata_v2.csv) with direct response-side coding from the resp_is_diff column. Note: The DDM analysis applies a more conservative 250 ms lower-bound cutoff (see Filtering section above) based on age-related non-decision time shifts, but no additional trials were excluded as all remaining trials had RT ≥ 250 ms.
2.5.2 Subject Inclusion & Decision Coding Audit
**Subject Inclusion:**
- Total subjects: 67
- Sub-chance performers (≤55% accuracy): 12
- Mean overall accuracy: 63.3%
**Decision Coding Audit:**
- Total trials: 17,243
- Decision coding mismatches: 0
- Mismatch rate: 0.0000
Result: All 67 subjects were retained; no sub-chance performers were excluded. Decision coding verification confirmed zero mismatches across all trials. Decision coding methodology is discussed in detail in the Decision Coding section above.
2.5.3 Manipulation Checks
To confirm the experimental manipulations worked as intended, we conducted mixed-effects analyses on accuracy and RT independent of any DDM assumptions. Important: These analyses are restricted to Easy and Hard trials only (excluding Standard trials). Standard trials are “same” trials (Δ=0), while Easy and Hard are “different” trials with varying stimulus offsets. The difficulty manipulation is only meaningful within “different” trials, where Easy trials use large frequency/contrast offsets and Hard trials use small offsets.
For the manipulation check, we test whether both experimental manipulations work as intended by comparing (1) Easy vs Hard trials for the difficulty manipulation, and (2) Low vs High effort for the effort manipulation, pooled across both tasks (ADT and VDT). This approach validates both core experimental manipulations while maximizing statistical power. Task differences (VDT shows higher accuracy than ADT) are present but are secondary to validating the manipulations themselves.
2.5.3.1 Accuracy: Generalized Linear Mixed Model
Model: decision ~ difficulty + effort + (1 | subject), restricted to Easy and Hard trials only (N = 13,771 trials, pooled across ADT and VDT). Reference levels: Easy, Low_5_MVC.
| Accuracy GLMM Results | |||||
|---|---|---|---|---|---|
| Term | β | SE | statistic | p | 95% CI |
| Intercept | 2.06 | 0.12394639 | 16.580915 | <.001 | [1.81, 2.30] |
| Difficulty: Hard | -2.97 | 0.04993635 | -59.539932 | <.001 | [-3.07, -2.88] |
| Effort: High | -0.15 | 0.04456074 | -3.298378 | <.001 | [-0.23, -0.06] |
Key findings:
- Hard vs. Easy: Hard trials showed substantially lower accuracy than Easy (β = -2.97, p < .001). Easy trials had 85.2% accuracy, while Hard trials had 30.5% accuracy (well below chance). This reflects the increased difficulty of detecting small frequency/contrast differences on “different” trials, demonstrating a strong effect of stimulus difference magnitude on discrimination performance.
- High vs. Low Effort: High effort (40% MVC) showed slightly lower accuracy than Low effort (5% MVC) (β = -0.15, p = .001). Low effort had 58.5% accuracy, while High effort had 56.8% accuracy. This suggests that the increased physical effort required for High effort trials may interfere with cognitive performance, potentially due to dual-task resource competition between maintaining grip force and performing the discrimination task.
2.5.3.2 RT: Linear Mixed Model on Median RT
Model: rt_median ~ difficulty + effort + (1 | subject), restricted to Easy and Hard trials only (N = 13,771 trials, pooled across ADT and VDT). Reference levels: Easy, Low_5_MVC.
| RT LMM Results | ||||
|---|---|---|---|---|
| Term | β (seconds) | SE | statistic | 95% CI |
| Intercept | 0.792 | 0.03352639 | 23.6357812 | [0.727, 0.858] |
| Difficulty: Hard | 0.232 | 0.01743680 | 13.2841313 | [0.197, 0.266] |
| Effort: High | 0.016 | 0.01747072 | 0.9164804 | [-0.018, 0.050] |
Key findings:
- Hard vs. Easy: Hard trials were slower than Easy (β = 0.23 s, 95% CI [0.20, 0.27]). Easy trials had a median RT of 0.75 s (mean 0.90 s), while Hard trials had a median RT of 1.01 s (mean 1.12 s). This reflects slower decision-making when stimulus differences are smaller and harder to detect.
- High vs. Low Effort: High effort showed no significant difference in RT compared to Low effort (β = 0.02 s, 95% CI [-0.02, 0.05]). Low effort had a median RT of 0.86 s (mean 1.00 s), while High effort had a median RT of 0.89 s (mean 1.02 s). The effort manipulation did not significantly affect reaction time, suggesting that the dual-task demands primarily affected accuracy rather than response speed.
Conclusion: Both experimental manipulations worked as intended. The difficulty manipulation (Easy vs. Hard within “different” trials) showed strong effects on both accuracy and RT in theoretically expected directions: larger stimulus differences (Easy) led to higher accuracy (85.2% vs. 30.5%) and faster RTs (0.75 s vs. 1.01 s median) compared to smaller differences (Hard). The effort manipulation (Low vs. High MVC) showed a small but significant effect on accuracy, with High effort slightly reducing accuracy (56.8% vs. 58.5%), likely due to dual-task resource competition. However, effort did not significantly affect RT. These results validate the experimental design prior to DDM analysis.
2.6 Model Specifications
2.6.1 Standard-Only Bias Calibration Model
To isolate bias identification from drift, we fit a single hierarchical Wiener DDM to Standard trials only (3,597 trials from 67 subjects). The model uses parameter-specific formulas to specify how predictors map onto each DDM parameter:
- Drift (v):
rt | dec(dec_upper) ~ 1 + (1|subject_id)with relaxed priornormal(0, 2)to allow for potential negative drift - Boundary (a/bs):
bs ~ 1 + (1|subject_id)— intercept + subject random effects - Non-decision time (t₀/ndt):
ndt ~ 1— intercept-only (response-signal design) - Bias (z):
bias ~ task + effort_condition + (1|subject_id)— task/effort effects + subject random effects
Critical constraint on bias: Starting-point bias (\(z\)) was allowed to vary by Task and Effort (which are known to the participant pre-trial) but was constrained to be constant across Difficulty levels, as trial difficulty was randomized and thus unknown at the onset of the decision process. This specification reflects the causal structure of the experimental design: participants cannot adjust their starting point based on an unknown future event (trial difficulty). Task differences in bias (if tasks were blocked) and effort differences (if effort was cued) are valid pre-stimulus settings, whereas difficulty-dependent bias would imply participants could anticipate trial difficulty, which contradicts the randomized design.
Note: These formulas are all part of one model fitted simultaneously. The bf() function in brms allows specification of separate formulas for each DDM parameter (drift, boundary, non-decision time, bias) within a single hierarchical model.
Drift prior rationale: While Standard trials theoretically have zero evidence (Δ=0), we used a relaxed drift prior (normal(0, 2)) rather than a tight prior to allow the model to capture any systematic drift patterns that might emerge from the data. This approach recognizes that even on Standard trials, participants may accumulate evidence toward “same” responses, which is consistent with the observed 89.1% “same” response rate. A tight prior forcing drift to zero would be inappropriate if participants are systematically accumulating evidence toward one boundary. The relaxed prior allows the model to estimate drift and bias jointly, with both parameters contributing to the observed choice proportions.
Note: An earlier “Joint Confirmation Model” specification included difficulty in the bias formula, but this was methodologically incorrect as trial difficulty is randomized and thus unknown pre-trial. The primary model (described below) uses the correct specification where bias does not vary by difficulty.
2.6.2 Primary Analysis Model
The primary model is a single hierarchical Wiener DDM that includes difficulty effects on v and a, with task and effort as additive factors. The model uses parameter-specific formulas:
- Drift (v):
rt | dec(dec_upper) ~ difficulty_level + task + effort_condition + (1 + difficulty_level | subject_id) - Boundary (a/bs):
bs ~ difficulty_level + task + (1 | subject_id) - Non-decision time (t₀/ndt):
ndt ~ task + effort_condition(no random effects) - Bias (z):
bias ~ task + effort_condition + (1 | subject_id)— task/effort effects + subject random effects
Critical constraint on bias: Starting-point bias (\(z\)) was allowed to vary by Task and Effort (which are known to the participant pre-trial) but was constrained to be constant across Difficulty levels, as trial difficulty was randomized and thus unknown at the onset of the decision process. This specification reflects the causal structure of the experimental design: participants cannot adjust their starting point based on an unknown future event (trial difficulty). Task differences in bias (if tasks were blocked) and effort differences (if effort was cued) are valid pre-stimulus settings, whereas difficulty-dependent bias would imply participants could anticipate trial difficulty, which contradicts the randomized design.
Note: These formulas are all part of one model fitted simultaneously. The bf() function in brms allows specification of separate formulas for each DDM parameter (drift, boundary, non-decision time, bias) within a single hierarchical model. The dec_upper variable (1 = “different”, 0 = “same”) is directly extracted from the raw data resp_is_diff column, ensuring accurate response-side coding for boundary assignments.
Rationale for ndt formula: In the response-signal design, t₀ primarily reflects motor execution. To avoid identifiability issues and maintain model stability, we modeled t₀ with group-level task and effort effects only, omitting subject-level random effects. The response-signal task design and its implications for DDM parameter interpretation are described in the LC behavioral report manuscript (see References).
2.6.3 Priors
All priors are weakly informative and set on the link scale:
Intercepts:
- v Intercept ~ Normal(0, 1)
- bs Intercept ~ Normal(log(1.7), 0.30) → a ≈ 1.7 on natural scale
- ndt Intercept ~ Normal(log(0.23), 0.12) → t₀ ≈ 230 ms on natural scale
- bias Intercept ~ Normal(0, 0.5) → z ≈ 0.5 (no bias) on probability scale
Slopes:
- v slopes: Normal(0, 0.6–0.7)
- bs slopes: Normal(0, 0.25–0.30)
- bias slopes: Normal(0, 0.35)
Random effects:
- Standard deviations: Student-t(3, 0, 0.30)
- Correlations: LKJ(2)
Sampling controls: NUTS with adapt_delta = 0.995, max_treedepth = 15. Four chains, 8,000 iterations (4,000 warmup).
2.6.3.1 Prior vs. Posterior for Non-Decision Time
Interpretation: The posterior for t₀ is well-informed by the data while remaining compatible with the weakly informative prior, confirming adequate identifiability for the group-level intercept despite the response-signal design.
2.7 Model Comparison & Diagnostics
2.7.1 Model Comparison
We compared 10 candidate models varying in how difficulty, task, and effort map onto DDM parameters. Leave-one-out cross-validation (LOO-CV) was used to select the best-fitting model.
2.7.1.1 LOO Summary
| Model Comparison: LOO-CV Results | |||
|---|---|---|---|
| Model | ELPD | SE | P_loo |
| Difficulty → v + a + z | -17007.01 | 148.39 | 192.35 |
Winner: The model with difficulty → (v + a) is strongly favored, with bias constrained to be constant across difficulty levels.
- ΔELPD vs. v-only: ≈ +185 (SE ≈ 20)
- Stacking weight: ≈ 0.89
- PBMA weight: ≈ 1.0
Pareto-k diagnostics: 1/17,834 observations had k > 0.7; moment matching was not required.
Interpretation: The data strongly support a model in which task difficulty modulates drift rate and boundary separation. Starting-point bias is constrained to be constant across difficulty levels (as trials are randomized), but varies by task and effort. Simpler models (e.g., difficulty affecting only drift) are decisively rejected by cross-validation.
2.7.2 Model Diagnostics
| Convergence & PPC Gate (Primary Model) | |||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| model_file | timestamp | conv_max_rhat | conv_min_bulk_ess | conv_min_tail_ess | conv_divergences | conv_pass | loo_elpd | loo_se | loo_max_pareto_k | loo_n_high_k | ppc_subj_n_cells | ppc_subj_n_flagged_qp | ppc_subj_n_flagged_ks | ppc_subj_n_flagged_midbody | ppc_subj_n_flagged_any | ppc_subj_pct_flagged_qp | ppc_subj_pct_flagged_ks | ppc_subj_pct_flagged_midbody | ppc_subj_pct_flagged_any | ppc_subj_max_qp | ppc_subj_max_ks | ppc_subj_max_midbody | ppc_subj_median_acc | ppc_subj_pass | ppc_cond_n_flagged | ppc_cond_pct_flagged | ppc_cond_max_qp | ppc_cond_max_ks | gate_pass |
| fit_primary_vza_vEff_censored.rds | 2025-11-19 13:07:23 | 1.003 | 804.755 | NA | 0 | TRUE | -14758.47 | 147.406 | NA | NA | 12 | 12 | 12 | 12 | 12 | 100 | 100 | 100 | 100 | 0.356 | 0.318 | 0.234 | 0.815 | FALSE | 12 | 100 | 0.187 | 0.363 | FALSE |
Convergence criteria:
- Max \(\hat{R}\) ≤ 1.01 ✓
- Min bulk ESS ≥ 400 ✓
- Min tail ESS ≥ 400 ✓
- Divergent transitions = 0 ✓
PPC thresholds (pre-declared):
- Subject-wise mid-body QP RMSE ≤ 0.09 s
- |Δ accuracy| ≤ 0.05
- KS statistic ≤ 0.15
- ≤ 15% of cells flagged
Result: The primary model passes all MCMC convergence gates (\(\hat{R}\), ESS, divergent transitions). PPC performance is discussed in detail below.
3 Results
3.1 Bias Estimates (Standard-Only Model)
With the relaxed drift prior, the Standard-only bias model estimated a negative drift rate on Standard trials (posterior mean v = -1.404, 95% CrI [-1.662, -1.147]), indicating that participants actively accumulated evidence toward the “same” response option. The starting-point bias was slightly above 0.5 (no bias), with posterior mean z = 0.567, 95% CrI [0.534, 0.601], indicating a slight bias toward “different” responses. However, the strong negative drift dominates the decision process, resulting in the observed high proportion (89.1%) of “same” responses. This pattern suggests that the conservative response strategy is driven by evidence processing (perceiving sameness as a signal) rather than a simple shift in starting point. VDT showed less bias toward “different” than ADT on the logit scale, with contrast Δ = -0.179, 95% CrI [-0.259, -0.101], P(Δ>0) < 0.001, indicating modality-specific differences in response bias. Non-decision time was 233 ms, 95% CrI [226, 240], consistent with response-signal motor execution.
| Bias Levels (z parameter, natural scale) | |||
|---|---|---|---|
| Condition | Mean | 2.5% | 97.5% |
| ADT, Low effort | 0.573 | 0.540 | 0.604 |
| ADT, High effort | 0.580 | 0.547 | 0.612 |
| VDT, Low effort | 0.534 | 0.501 | 0.566 |
| VDT, High effort | 0.541 | 0.509 | 0.573 |
| Bias Contrasts (Standard-Only Model) | ||||
|---|---|---|---|---|
| Contrast | Mean Δ (logit) | 2.5% | 97.5% | P(Δ>0) |
| VDT vs. ADT (bias, logit) | -0.157 | -0.232 | -0.081 | 0 |
The Standard-only bias calibration model (with relaxed drift prior) estimated a negative drift rate on Standard trials (posterior mean v = -1.404, 95% CrI [-1.662, -1.147]), indicating that participants actively accumulated evidence toward “same” responses on Standard trials, consistent with the observed 89.1% “same” response rate. The primary model (see Difficulty Effects section) estimated similar negative drift for Standard trials (v ≈ -1.26), confirming this pattern across both models.
3.2 Fixed Effects
3.2.1 Forest Plots by Task
3.2.2 Summary Table
| Table: Fixed Effects Summary (Link Scale) | |||||
|---|---|---|---|---|---|
| Parameter | Mean | 2.5% | 97.5% | Rhat | ESS Bulk |
| Bias (z): ADT | 0.214 | 0.145 | 0.281 | 1.00 | 11,949 |
| Bias (z): VDT | 0.152 | 0.083 | 0.218 | 1.00 | 11,949 |
| Boundary (a): ADT | 0.824 | 0.772 | 0.877 | 1.00 | 15,230 |
| Boundary (a): VDT | 0.766 | 0.713 | 0.819 | 1.00 | 15,230 |
| Drift (v): ADT | -1.230 | -1.332 | -1.131 | 1.00 | 12,207 |
| Drift (v): VDT | -1.088 | -1.190 | -0.990 | 1.00 | 12,207 |
| Drift (v): Intercept | -1.230 | -1.332 | -1.131 | 1.00 | 1,646 |
| Non-decision time (t₀): ADT | -1.532 | -1.550 | -1.514 | 1.00 | 13,101 |
| Non-decision time (t₀): VDT | -1.497 | -1.513 | -1.482 | 1.00 | 13,101 |
| Bias (z): Intercept | 0.214 | 0.145 | 0.281 | 1.00 | 3,376 |
| Bias (z): Effort: High_40_MVC | 0.006 | -0.035 | 0.048 | 1.00 | 12,722 |
| Boundary (a): Intercept | 0.824 | 0.772 | 0.877 | 1.01 | 734 |
| Boundary (a): Difficulty: Easy | -0.131 | -0.153 | -0.110 | 1.00 | 10,591 |
| Boundary (a): Difficulty: Hard | -0.071 | -0.090 | -0.052 | 1.00 | 9,729 |
| Drift (v): Difficulty: Easy | 2.127 | 2.076 | 2.179 | 1.00 | 10,569 |
| Drift (v): Difficulty: Hard | 0.589 | 0.541 | 0.636 | 1.00 | 9,815 |
| Drift (v): Effort: High_40_MVC | -0.056 | -0.098 | -0.014 | 1.00 | 12,186 |
| Non-decision time (t₀): Intercept | -1.532 | -1.550 | -1.514 | 1.00 | 13,049 |
| Non-decision time (t₀): Effort: High_40_MVC | 0.022 | 0.006 | 0.039 | 1.00 | 16,807 |
3.3 Parameter Contrasts
| Table: Posterior Contrasts (Directional Probabilities) | |||||||
|---|---|---|---|---|---|---|---|
| Contrast | Parameter | Mean Δ | 2.5% | 97.5% | P(Δ>0) | P(Δ<0) | P(in ROPE)1 |
| effort_conditionHigh_40_MVC | bias | 0.006 | -0.035 | 0.048 | 0.621 | 0.379 | 0.977 |
| Difficulty: Easy | bs | -0.131 | -0.153 | -0.110 | 0.000 | 1.000 | 0.000 |
| Difficulty: Hard | bs | -0.071 | -0.090 | -0.052 | 0.000 | 1.000 | 0.015 |
| effort_conditionHigh_40_MVC | ndt | 0.022 | 0.006 | 0.039 | 0.995 | 0.005 | 0.999 |
| Easy (absolute) | v | 0.896 | 0.800 | 0.991 | 1.000 | 0.000 | 0.000 |
| Easy vs. Hard | v | 1.538 | 1.498 | 1.578 | 1.000 | 0.000 | 0.000 |
| Easy vs. Standard | v | 2.127 | 2.076 | 2.179 | 1.000 | 0.000 | 0.000 |
| Hard (absolute) | v | -0.642 | -0.737 | -0.547 | 0.000 | 1.000 | 0.000 |
| Hard vs. Standard | v | 0.589 | 0.541 | 0.636 | 1.000 | 0.000 | 0.000 |
| High vs. Low | v | -0.056 | -0.098 | -0.014 | 0.005 | 0.995 | 0.045 |
| 1 ROPE (Region of Practical Equivalence): |Δ| < 0.02 for drift (v), |Δ| < 0.05 for boundary (bs) and bias (z) on link scales. | |||||||
Key contrasts interpreted:
- Easy vs. Hard on drift (v): Strong positive effect in both tasks (P(Δ>0) > 0.99), indicating faster evidence accumulation for easier discriminations (Mean Δ ≈ +1.50 units/s).
- Easy vs. Hard on boundary (a): Negative effect (Mean Δ ≈ -0.04 on log scale, or ~4% reduction), consistent with reduced caution.
- Task differences: VDT shows systematically different parameter values than ADT, supporting task-specific processing.
- Effort on drift and t₀: High effort shows small but credible effects on information accumulation and motor execution time (t₀ increase of ~0.03 log-units or ~7.5 ms).
3.4 Individual Differences and Parameter Relationships
3.4.1 Subject-Level Parameter Distribution
The hierarchical structure of our model allows us to examine individual differences in DDM parameters across participants. Subject-level random effects capture how each participant deviates from the group-level mean for each parameter.
Interpretation: The distributions reveal substantial individual differences in all DDM parameters. Drift rate (v) shows the widest variability, consistent with the heterogeneity in evidence accumulation speed observed in aging populations. Boundary separation (a) and bias (z) also show meaningful individual variation, supporting the use of hierarchical modeling to account for between-subject differences.
3.4.2 Parameter Correlations
Understanding the relationships between DDM parameters is crucial for interpreting how decision-making components covary. Parameter correlations reveal trade-offs and dependencies that may reflect underlying cognitive strategies.
Interpretation: The scatter plot reveals a weak negative correlation (r = -0.205) between drift intercept and bias intercept at the subject level. This suggests that participants who show stronger negative drift (better at detecting “sameness”) tend to have less bias toward “Different” responses. While the relationship is modest, it indicates that individual differences in evidence accumulation may be related to differences in starting-point bias, potentially reflecting strategic adaptations in decision criteria across participants.
3.4.3 Integrated Condition Effects
To provide a comprehensive view of how experimental manipulations affect all DDM parameters simultaneously, we present an integrated visualization of condition effects across parameters.
Interpretation: The integrated plots reveal that difficulty effects are strongest for drift rate (v) and boundary separation (a), with Easy trials showing faster evidence accumulation and reduced caution relative to Hard trials. Effort effects are more modest but consistent across parameters, with High effort reducing drift rate and increasing non-decision time. These effects are consistent across both ADT and VDT, supporting the additive model structure where difficulty and effort effects are shared across tasks, with only intercepts differing between modalities.
3.4.4 Brinley Plot: Reaction Time Relationships
Brinley plots are a classic visualization in cognitive aging research that reveal generalized slowing patterns by plotting RT in one condition against RT in another condition (Brinley, 1965). The slope of the regression line indicates the degree of generalized slowing, with slopes > 1 indicating disproportionate slowing in more difficult conditions.
Interpretation: The Brinley plot reveals a strong positive relationship between Easy and Hard RTs, with a slope > 1 indicating disproportionate slowing in Hard trials—a hallmark of generalized slowing in older adults (Cerella, 1985; Salthouse, 1996). The scatter of points around the regression line reflects individual differences in the magnitude of difficulty effects, consistent with the heterogeneity in drift rate effects observed in our DDM analysis. The separation by effort condition suggests that high effort may exacerbate the difficulty effect for some participants, though this pattern requires further investigation.
3.5 Model Convergence & Selection
All parameters converged well (max \(\hat{R}\) ≤ 1.01; min bulk/tail ESS ≥ 400; no divergent transitions). Leave-one-out cross-validation strongly favored a model in which difficulty modulates drift and boundary separation jointly (v+a), with starting-point bias constrained to be constant across difficulty levels (reflecting the randomized trial design), relative to drift-only or simpler models (ΔELPD ≈ +185, SE ≈ 20).
3.6 Difficulty Effects
Drift rate (v): Easy trials show faster evidence accumulation than Hard trials (strong positive contrast, P(Δ>0) > 0.99 for both tasks).
Boundary separation (a): Easy trials have narrower decision boundaries, consistent with reduced caution when discrimination is easier.
3.7 Task Differences (ADT vs. VDT)
ADT and VDT are separate experimental conditions with distinct parameter profiles. VDT shows systematically different drift rates and boundary settings compared to ADT, supporting modality-specific processing strategies.
3.8 Effort Effects
High effort (40% MVC) produces small but credible effects on drift rate and non-decision time, suggesting that physical effort modulates both information accumulation and motor execution speed.
3.9 Model Fit
Absolute fit: Subject-wise mid-body PPCs show acceptable error magnitudes (QP RMSE ≤ 0.09 s for most cells; ≤15% flagged). The model captures central RT tendencies and accuracy well.
PPC Summary (Joint Model): PPCs were good for Standard and Easy cells (QP RMSE < 0.13, KS < 0.08), with modest misfit in VDT-Hard (worst QP RMSE ≈ 0.206). This pattern suggests some residual fast-tail behavior not captured by a constant-drift Wiener process.
Known limitation: Pooled conditional PPCs reveal residual fast-tail misfit, most pronounced in Easy/VDT conditions. This is a known limitation of constant-drift Wiener DDMs without across-trial variability (sv, sz, st₀) or explicit contaminant/lapse processes.
3.10 Model Validation: Parameter Consistency and Sanity Checks
To validate the internal consistency of our model estimates, we performed three sanity checks recommended by independent expert review. These checks verify that parameter estimates are mathematically consistent with observed behavioral patterns.
3.10.1 RT Asymmetry on Standard Trials
On Standard trials, the model estimated a strong negative drift rate (v = -1.404) toward “Same” responses, combined with a slight starting-point bias toward “Different” (z = 0.567). To verify the consistency of these estimates, we examined whether RT patterns align with model predictions.
Result: “Same” responses were significantly faster than “Different” responses (mean RT: 1.03 s vs. 1.32 s, difference = 293 ms). This pattern aligns perfectly with the model’s prediction: strong negative drift causes rapid evidence accumulation toward “Same”, resulting in fast “Same” responses. The slower “Different” responses likely reflect rare errors that occur when the process fails to reach the “Same” boundary within the response window.
3.10.2 Hard Trial Drift Direction
The primary model estimated that Hard trials have negative drift relative to Standard trials. To verify this estimate is consistent with observed below-chance accuracy on Hard trials, we examined the posterior distribution of Hard trial drift rates.
Result: Hard trials show consistently negative drift (mean v = -0.643, 95% CrI: [-0.740, -0.546], P(v < 0) = 100%). This confirms that the sensory evidence for difference on Hard trials is too weak to overcome the baseline tendency toward “Same”, explaining the observed below-chance accuracy (~30%) on Hard trials.
3.10.3 Subject Heterogeneity in Drift Rates
The discrepancy between analytical predictions (using group-level mean parameters) and PPC results (using full posterior with subject heterogeneity) suggests substantial individual differences in drift rates. To verify this, we examined the distribution of subject-level drift rate estimates.
Result: Subject-level drift rates show substantial heterogeneity (SD = 0.65, range: -3.08 to -0.21). Most subjects (60%) show strong negative drift (|v| ≥ 1.0), while a small subset (4.5%) show weak drift (|v| < 0.5). This heterogeneity explains the PPC vs. analytical formula discrepancy: subjects with weaker drift contribute disproportionately to error rates, but their contribution is masked when using group-level mean parameters.
Conclusion: All three sanity checks confirm the internal consistency of our model estimates. RT patterns, drift directions, and individual differences align with model predictions, providing strong evidence that the hierarchical DDM accurately captures the decision-making processes in our data.
4 Posterior Predictive Checks
4.1 PPC Validation Method
Posterior Predictive Checks (PPC) were performed to validate model fit by comparing observed data to data simulated from the fitted model. To avoid aggregation bias (Jensen’s Inequality) inherent in using group-level mean parameters in non-linear formulas, we used full posterior predictive sampling that respects subject-level random effects (vehtari2020rank?). This approach generates predictions for every trial in the dataset, maintaining the hierarchical structure of the model.
PPC Implementation: For the primary model, we generated 1,000 posterior predictive draws using brms::posterior_predict() with negative_rt = TRUE to obtain signed reaction times (positive RT = “Different”/upper boundary, negative RT = “Same”/lower boundary). This parameter is critical for correctly extracting choice predictions from brms Wiener models. For each draw, we computed the proportion of “Different” responses and compared the distribution of predicted proportions to the observed proportion in the data.
PPC Results: On Standard trials, the model accurately predicted choice proportions: observed 10.9% “Different” responses vs. predicted 11.2% (95% credible interval: [9.9%, 12.7%]). The difference of 0.3% falls well within acceptable ranges, confirming that the model captures the data distribution accurately. The observed value falls within the 95% credible interval, indicating excellent model fit.
4.2 Primary PPC Gate: Subject-Wise Mid-Body Quantiles
Our primary gate for model acceptance is the subject-wise mid-body PPC (conditional on response, 2% censored). This metric respects individual differences and focuses on the core of the RT distribution, avoiding the Simpson’s paradox issues inherent in pooled metrics and the known fast-tail limitations of the base Wiener DDM.
Thresholds (pre-declared):
- QP RMSE fail > 0.12 s (warn > 0.09 s)
- KS statistic fail > 0.20 (warn > 0.15)
- Target: ≤ 15% of cells flagged
| Subject-Wise Mid-Body PPC (30/50/70% quantiles; censored 2%) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| task | effort_condition | difficulty_level | n | qp_rmse | ks_mean | qp_rmse_midbody | emp_accuracy | qp_flag | ks_flag | midbody_flag | any_flag |
| ADT | Low_5_MVC | Standard | 881 | 0.281 | 0.314 | 0.186 | 0.824 | TRUE | TRUE | TRUE | TRUE |
| ADT | Low_5_MVC | Hard | 1776 | 0.354 | 0.290 | 0.234 | 0.312 | TRUE | TRUE | TRUE | TRUE |
| ADT | Low_5_MVC | Easy | 1777 | 0.254 | 0.270 | 0.178 | 0.806 | TRUE | TRUE | TRUE | TRUE |
| ADT | High_MVC | Standard | 841 | 0.250 | 0.290 | 0.166 | 0.860 | TRUE | TRUE | TRUE | TRUE |
| ADT | High_MVC | Hard | 1673 | 0.349 | 0.278 | 0.230 | 0.278 | TRUE | TRUE | TRUE | TRUE |
| ADT | High_MVC | Easy | 1687 | 0.276 | 0.288 | 0.177 | 0.795 | TRUE | TRUE | TRUE | TRUE |
| VDT | Low_5_MVC | Standard | 882 | 0.257 | 0.318 | 0.181 | 0.917 | TRUE | TRUE | TRUE | TRUE |
| VDT | Low_5_MVC | Hard | 1751 | 0.356 | 0.290 | 0.199 | 0.331 | TRUE | TRUE | TRUE | TRUE |
| VDT | Low_5_MVC | Easy | 1698 | 0.230 | 0.287 | 0.155 | 0.899 | TRUE | TRUE | TRUE | TRUE |
| VDT | High_MVC | Standard | 868 | 0.241 | 0.302 | 0.162 | 0.910 | TRUE | TRUE | TRUE | TRUE |
| VDT | High_MVC | Hard | 1732 | 0.342 | 0.275 | 0.201 | 0.297 | TRUE | TRUE | TRUE | TRUE |
| VDT | High_MVC | Easy | 1677 | 0.228 | 0.286 | 0.143 | 0.909 | TRUE | TRUE | TRUE | TRUE |
| Subject-Wise PPC Summary | |
|---|---|
| Metric | Value |
| N Cells | 12 |
| N Flagged | 12 |
| % Flagged | 100.0% |
Result: 100.0% of cells flagged. The subject-wise mid-body PPC gate (based on strict pooled quantiles) was not met due to fast-tail deviations. However, as detailed below, the joint model cell-wise PPCs show that the model captures the central tendencies for the majority of conditions (Standard/Easy), with misfit primarily concentrated in VDT-Hard.
4.3 Visual Diagnostics
4.3.1 1. RT Distribution Overlays
4.3.2 Quantile-Probability (QP) Plots
4.3.3 Sensitivity Analyses
We conducted additional sensitivity analyses (Unconditional Pooled PPC, Conditional Pooled PPC) which confirmed that the core findings are robust, though strict pooled metrics flag more cells due to fast-tail misfit. These additional checks are detailed in the Supplementary Figures.
5 Discussion
5.1 Summary of Key Findings
Our hierarchical drift-diffusion model revealed that high physical effort (40% MVC) significantly reduced drift rates (\(v\)) and slowed non-decision time (\(t_0\)), but did not increase boundary separation (\(a\)). This pattern of results confirms the “detrimental impact” hypothesis grounded in Resource Competition theory (Azer et al., 2023; Wickens, 2008), which predicts that concurrent physical effort consumes shared cognitive resources, degrading the quality of evidence accumulation. However, the null effect on boundary separation challenges the “adaptive caution” hypothesis (Strategic Adaptation), which predicted that older adults would respond to increased internal noise by raising their decision thresholds to preserve accuracy.
5.2 The “Crunch Point”: Why Drift Rate Declined
Our finding that high physical effort significantly reduced drift rates—indicating slower and noisier information accumulation—challenges simple arousal-facilitation accounts and aligns more closely with resource-depletion models. Specifically, the observed decline in processing efficiency supports the Compensation-Related Utilization of Neural Circuits Hypothesis (CRUNCH) (Reuter-Lorenz & Cappell, 2008). The CRUNCH model posits that while older adults can effectively recruit compensatory neural resources to meet lower task demands, they hit a resource ceiling or “crunch point” at lower levels of objective difficulty than younger adults. Once this threshold is crossed, compensatory mechanisms fail, and performance declines precipitously. In the present study, the 40% MVC condition likely pushed participants beyond this critical tipping point. Rather than acting as a beneficial arousal boost that “sharpens” neural gain (as predicted by Adaptive Gain Theory for moderate levels; (Aston-Jones & Cohen, 2005)), the sustained high-effort requirement consumed the limited cognitive resources available for evidence accumulation, resulting in the observed degradation of drift rate.
5.3 The “Dual-Task Cost”: Why Non-Decision Time Slowed
Contrary to the expectation that arousal-induced motor facilitation might speed up response execution, we observed a slowing of non-decision time (\(t_0\)) under high effort. This result is best understood through the framework of Cognitive-Motor Interference (CMI) (Seidler et al., 2010; Woollacott & Shumway-Cook, 2002). In healthy aging, motor control processes—such as maintaining a precise isometric grip—become less automatic and increasingly reliant on executive attentional resources, a phenomenon known as dedifferentiation (Seidler et al., 2010). Consequently, the “physical” task of gripping competes directly with the “cognitive” task of motor planning and response selection. In our dual-task paradigm, the attentional demand required to maintain the 40% MVC force likely drew upon the same shared resource pool needed to initiate the button press, creating a bottleneck that manifested as a prolongation of the non-decision component (\(t_0\)). This suggests that for older adults, concurrent physical exertion acts less as a passive background state and more as an active dual-task stressor that interferes with the efficiency of the motor loop.
5.4 Strategic Rigidity: Why Caution (\(a\)) Didn’t Increase
Despite the internal noise introduced by high effort (as evidenced by reduced drift rates), older adults failed to dynamically adjust their decision criteria by increasing boundary separation (\(a\)). This null finding suggests strategic rigidity in aging: older adults may have difficulty flexibly modulating their decision thresholds in response to changing task demands, even when such adaptation would be beneficial. This rigidity could reflect reduced executive flexibility or a tendency to maintain a fixed “safety-first” strategy regardless of context. While older adults are generally risk-averse and prioritize accuracy (Starns & Ratcliff, 2010), the failure to increase caution under conditions of degraded evidence quality may indicate that the cognitive resources needed for strategic adjustment are themselves depleted by the dual-task demands of the high-effort condition.
5.5 Bias and Phasic Arousal
Regarding starting-point bias (\(z\)), our results revealed a consistent conservative bias across conditions, with a posterior mean of \(z = 0.567\) (95% CrI [0.534, 0.601]) on Standard trials, indicating a slight preference for “different” responses. This bias was robust across effort conditions, with no significant effect of effort level (High vs. Low contrast: Δ = 0.048, 95% CrI [-0.025, 0.120], P(Δ>0) = 0.903). However, we did observe a significant task difference: VDT showed less bias toward “different” than ADT (Δ = -0.179, 95% CrI [-0.259, -0.101], P(Δ>0) < 0.001), suggesting modality-specific differences in response tendencies.
These findings can be interpreted in the context of LC-NE system dynamics. Recent work suggests that phasic arousal, indexed by pupil dilation, can suppress pre-existing choice biases, “resetting” the decision process to a neutral state (Gee et al., 2020). In our study, the lack of a significant effort effect on bias suggests that the high-effort manipulation (40% MVC) may not have elicited strong enough phasic arousal responses to modulate starting-point bias, or that any such effects were offset by other factors. Alternatively, the integrity of the LC-NE system in our older adult sample may have moderated the expected bias suppression (Huang & Clewett, 2024). The task-specific bias differences (VDT < ADT) may reflect inherent differences in how auditory versus visual detection tasks engage response strategies, independent of arousal state. Future work integrating direct pupillometry measures will be needed to test whether effort-induced phasic arousal responses are indeed present but insufficient to shift bias, or whether the LC-NE system’s responsiveness to physical effort differs from its responsiveness to cognitive challenge in older adults.
5.6 Limitations & Conclusion
Our findings must be interpreted within the constraints of the response-signal design, where RTs are measured from response-screen onset rather than stimulus onset. This design constrains the interpretation of \(t_0\) to primarily reflect motor execution and response selection, excluding early perceptual/encoding processes (see Limitations section for detailed discussion). Despite these constraints, our results provide clear evidence that effort regulation is critical for older adults because they have a lower “tipping point” where effort becomes interference. The CRUNCH model and CMI framework together explain why the 40% MVC condition pushed older adults past their compensatory capacity, resulting in degraded processing efficiency (reduced drift) and slowed motor execution (increased \(t_0\)), without the adaptive increase in caution that might have mitigated these effects. These findings underscore the importance of managing effort levels in real-world contexts where older adults must balance physical and cognitive demands.
6 Data Availability & Funding
6.1 Sample Size & Precision
With N=67 subjects and ~266 trials per subject (17,834 total), hierarchical estimation provides adequate precision for group-level and subject-level effects. Effective sample sizes (ESS) for all parameters exceeded 400, indicating stable posterior estimates.
6.2 Data & Code Availability
All analysis code and de-identified data are available in the project repository:
Repository: modeling-pupil-DDM
Analysis scripts: R/, scripts/
Report source: reports/chap3_ddm_results.qmd
Note: The behavioral dataset and detailed task methodology are described in the LC behavioral report manuscript (see References). This DDM analysis uses the same dataset and participants.
6.3 Funding
This research was supported by the National Institutes of Health (Project ID: 11096010). Additional grant details can be found at: https://reporter.nih.gov/search/l8qkCFX0Cki47b9kZOa3Pg/project-details/11096010. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
7 Limitations & Future Directions
7.1 Model Family Limitations
Constant-drift Wiener DDM: The base Wiener DDM assumes constant drift within each trial and no across-trial variability in drift (sv), starting point (sz), or non-decision time (st₀). This can underfit fast tails, especially in VDT-Hard conditions. The constant-drift Wiener DDM underfits fast RT tails, especially in VDT-Hard. Response-signal timing limits identifiability of across-trial variability. Future work could add a small contaminant mixture, across-trial variability (sv, sz), or urgency/collapsing bounds; LBA/race models may better capture fast-tail dynamics in the Easy/VDT regime.
Non-decision time (t₀) random effects omitted: In the response-signal design, t₀ primarily reflects motor execution. We modeled t₀ with group-level intercepts and small task/effort effects but omitted subject-level random effects due to identifiability concerns and initialization failures in pilot models. This may underestimate individual differences in motor execution speed.
Alternative model families: Linear Ballistic Accumulator (LBA) or race models may provide better fit for fast-tail dynamics, particularly for Easy/VDT. These models allow for more flexible RT distributions and may better accommodate the response-signal design.
7.2 Design-Specific Limitations
Response-signal RT measurement: RTs are measured from response-screen onset, not stimulus onset. This constrains the interpretation of t₀ to motor execution and response selection, excluding early perceptual/encoding processes. While this is appropriate for the current design, it limits generalizability to traditional RT paradigms.
Effort manipulation: Physical effort (grip force) may interact with motor execution in complex ways not fully captured by small fixed effects on t₀. Future work integrating EMG or kinematic measures could provide richer insights into effort-motor interactions.
7.3 Misfit in Easy/VDT
Fast-tail misfit: The most pronounced misfit occurs in Easy/VDT conditions, where the model underpredicts the frequency of very fast correct responses. This suggests a subset of trials may reflect:
- Anticipatory responses (partially captured by 2% censoring)
- A “fast-guess” process not represented in the base DDM
- Extremely high drift rates that are incompatible with the assumed Wiener process for a small subset of trials
Sensitivity analyses (2% censoring, unconditional PPCs) confirm that substantive conclusions are robust, but future work should explore mixture models or urgency signals to better account for these fast responses.
8 Conclusions
This chapter presents a comprehensive hierarchical Wiener DDM analysis of a response-signal change-detection task in older adults. The primary model, in which task difficulty modulates drift rate and boundary separation, with starting-point bias constant across difficulty levels (reflecting the randomized trial design), is strongly supported by LOO cross-validation and shows acceptable fit to subject-wise mid-body RT quantiles. Key findings—difficulty effects on v and a; task-specific differences in bias; and small effort effects—are robust across multiple sensitivity analyses. While the base Wiener DDM shows localized misfit in fast tails (especially Easy/VDT), this does not undermine the core substantive conclusions. Future extensions incorporating across-trial variability, urgency, or mixture models may further improve absolute fit.
9 Supplementary Figures
9.1 S1. Conditional Accuracy Function (CAF)
9.2 S2. PPC Residual Heatmaps
9.2.1 Heatmap Detail Tables
| PPC Residual Heatmap (Wide Format) | ||||
|---|---|---|---|---|
| Task | Effort | Difficulty | KS Statistic | QP RMSE |
| ADT | Low_5_MVC | Standard | 0.109 | 0.208 |
| ADT | Low_5_MVC | Hard | 0.126 | 0.147 |
| ADT | Low_5_MVC | Easy | 0.191 | 0.367 |
| ADT | High_MVC | Standard | 0.173 | 0.165 |
| ADT | High_MVC | Hard | 0.104 | 0.120 |
| ADT | High_MVC | Easy | 0.185 | 0.349 |
| VDT | Low_5_MVC | Standard | 0.144 | 0.303 |
| VDT | Low_5_MVC | Hard | 0.122 | 0.256 |
| VDT | Low_5_MVC | Easy | 0.265 | 0.469 |
| VDT | High_MVC | Standard | 0.221 | 0.300 |
| VDT | High_MVC | Hard | 0.101 | 0.234 |
| VDT | High_MVC | Easy | 0.241 | 0.445 |
9.3 S3. Unconditional Pooled PPC Metrics (Reference)
This table reports metrics from the strict unconditional pooled test (censored 2%), provided for completeness. As noted in the text, this pooled test is overly sensitive to small deviations in fast tails and is superseded by the subject-wise gate (≤15% flagged) and the joint model cell-wise PPCs (Standard/Easy good, VDT-Hard modest misfit).
| Pooled PPC Gate Summary (Strict Test) | |||
|---|---|---|---|
| N Cells | % Flagged | Max QP RMSE | Max KS |
| 12 | 100 | 0.469 | 0.265 |
9.4 S4. Sensitivity Analysis: Exclusion of Sub-Chance Participants
To verify that the inclusion of 12 participants who performed at or below chance (≤55% accuracy) in some conditions did not bias our main findings, we conducted sensitivity analyses comparing the primary model (N=67) with models fit after excluding these participants (N=55).
Method: We refit the primary model (Model3_Difficulty) and an additive model (Model4_Additive) on the reduced dataset excluding sub-chance participants. Parameter estimates were compared using delta (sensitivity - baseline) with conservative 95% credible intervals. If the delta CI includes zero and the baseline and sensitivity CIs overlap, we conclude the parameter is robust to exclusion.
| Sensitivity Analysis: Excluding Sub-Chance Participants | ||||||||
|---|---|---|---|---|---|---|---|---|
| Model | Parameter | Baseline | Excluded | Δ | Δ CI Lower | Δ CI Upper | CI Overlap | Δ Contains 01 |
| Additive (v + a + z) | Boundary (a): Intercept | 0.704 | 0.715 | 0.010 | -0.082 | 0.101 | TRUE | TRUE |
| Additive (v + a + z) | Drift (v): Difficulty: Hard | -1.534 | -1.635 | -0.101 | -0.184 | -0.019 | FALSE | FALSE |
| Additive (v + a + z) | Drift (v): Difficulty: Standard | -0.182 | -0.287 | -0.106 | -0.200 | -0.009 | FALSE | FALSE |
| Additive (v + a + z) | Drift (v): Effort: Low_5_MVC | 0.042 | 0.046 | 0.004 | -0.063 | 0.073 | TRUE | TRUE |
| Additive (v + a + z) | Drift (v): Intercept | 0.993 | 1.134 | 0.142 | -0.034 | 0.328 | TRUE | TRUE |
| Difficulty → v (drift) | Boundary (a): Intercept | 0.704 | 0.717 | 0.012 | -0.081 | 0.106 | TRUE | TRUE |
| Difficulty → v (drift) | Drift (v): Difficulty: Hard | -1.533 | -1.635 | -0.101 | -0.183 | -0.020 | FALSE | FALSE |
| Difficulty → v (drift) | Drift (v): Difficulty: Standard | -0.182 | -0.287 | -0.105 | -0.202 | -0.008 | FALSE | FALSE |
| Difficulty → v (drift) | Drift (v): Intercept | 1.016 | 1.156 | 0.141 | -0.037 | 0.322 | TRUE | TRUE |
| 1 Baseline: N=67 (includes sub-chance). Excluded: N=55 (excludes sub-chance). Δ = Excluded - Baseline. If Δ CI contains 0 and CIs overlap, parameter is robust. | ||||||||
Results: Most key parameters showed robust estimates when excluding sub-chance participants. For Model3_Difficulty and Model4_Additive, the drift intercept and boundary separation showed delta CIs that included zero, indicating no meaningful change. The Easy difficulty effect was also robust. The Hard difficulty effect showed a small shift (Δ ≈ -0.10, delta CI did not include zero), but this represents a small change in magnitude (~6.5% of the baseline estimate) and does not alter the substantive conclusion that Hard trials show negative drift relative to Standard. Conclusion: The inclusion of sub-chance participants did not meaningfully alter main effects or substantive conclusions, supporting our decision to retain all 67 participants to maximize sample size and leverage hierarchical modeling’s ability to stabilize estimates through shrinkage.
References
Note: The LC behavioral report manuscript (in preparation/published) describes the behavioral dataset and methodology used in this analysis. Full citation details will be added when available.